Observation and classification of device events

ABSTRACT

Systems and methods observe and classify device events. A model containing a set of features to be observed can be determined based on machine learning and training methods. A client application can issue a transaction request to an operating system service. A determination can be made whether the operating system service, a method associated with the transaction request, and the client application are currently being observed. In response to determining that the operating system service, a method associated with the transaction request, and the client application are being observed, a behavioral vector associated with the client application can be modified to indicate that the feature represented by the method is associated with the client application. The behavioral vector can be used to determine if the client application is malware.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority benefit of U.S. ProvisionalApplication Ser. No. 62/565,917, filed on Sep. 29, 2017, entitled“OBSERVATION AND CLASSIFICATION OF DEVICE EVENTS,” the entire disclosureof which is incorporated herein by reference.

FIELD

The disclosure relates generally to systems and methods for devicesecurity, and more particularly, to observing and classifying deviceevents.

BACKGROUND

Over time, smart phones have become more capable and their use hasincreased. However, as a result of this increasing use, smart phoneshave become a more attractive target for malware. Malware, short for“malicious software,” is software that can be used to disrupt deviceoperations, damage data, gather sensitive information, or gain access toprivate computer systems without the user's knowledge or consent.Examples of such malware include software viruses, trojan horses,rootkits, ransomware etc. Correctly identifying which files orapplications that contain malware and which are benign can be adifficult task, because malware developers often obfuscate variousattributes of the malware in an attempt to avoid detection byanti-malware software.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the inventive subject matter, referencemay be made to the accompanying drawings in which:

FIG. 1 is a block diagram illustrating data flow between components of asystem for observing and classifying device events according toembodiments.

FIG. 2 is a block diagram illustrating a software environment forobserving and classifying device events according to embodiments.

FIG. 3 is a block diagram illustrating an event flow between anapplication and a service in a system for observing and classifyingdevice events according to embodiments.

FIG. 4 is a block diagram illustrating components of a softwareenvironment for updating a dynamic model in a system of observing andclassifying device events according to embodiments.

FIG. 5 is a flow chart illustrating a method for initializingobservation and classification of device events according toembodiments.

FIG. 6 is a flow chart illustrating a method for filtering observed andclassified device events according to embodiments.

FIG. 7 is a block diagram of an example embodiment of a computer systemupon which embodiments of the inventive subject matter can execute.

DETAILED DESCRIPTION

In the following detailed description of example embodiments, referenceis made to the accompanying drawings that form a part hereof, and inwhich is shown by way of illustration specific example embodiments inwhich the inventive subject matter may be practiced. These embodimentsare described in sufficient detail to enable those skilled in the art topractice the inventive subject matter, and it is to be understood thatother embodiments may be utilized and that logical, mechanical,electrical and other changes may be made without departing from thescope of the inventive subject matter.

Some portions of the detailed descriptions which follow are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the ways used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like. It should be borne in mind, however, thatall of these and similar terms are to be associated with the appropriatephysical quantities and are merely convenient labels applied to thesequantities. Unless specifically stated otherwise as apparent from thefollowing discussions, terms such as “processing” or “computing” or“calculating” or “determining” or “displaying” or the like, refer to theaction and processes of a computer system, or similar computing device,that manipulates and transforms data represented as physical (e.g.,electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The description of the various embodiments is to be construed asexamples only and does not describe every possible instance of theinventive subject matter. Numerous alternatives could be implemented,using combinations of current or future technologies, which would stillfall within the scope of the claims. The following detailed descriptionis, therefore, not to be taken in a limiting sense, and the scope of theinventive subject matter is defined only by the appended claims.

Various techniques are used in practice for malware detection on mobiledevices such as smart phones. One such technique, behavioral analysis,uses observation of process behavior to heuristically detect maliciousintent. In practice, the design of the malware detection system shouldprobe the system at specific locations of interest and compare thebehavior against the behavior of known malware. The malware detectionsystem performs the detection in two steps. During the first step, anobservation is performed to collect events of interest. During thesecond step, the events of interest in aggregate are filtered andclassified. A behavioral vector results from the classification of theaggregated events which can be further processed into a heuristic, orconfidence level. The confidence level can be compared against anobjective metric to determine of whether or not the observation ismalicious behavior. Observed malicious behavior can be correlated withthe target application to determine whether an application is maliciousor benign, malicious, or undetermined.

Definitions

As used herein, a hook is defined as a small piece of code used togather information on a “feature” usage inside a system. Informationobtained by the hook during program execution can provide data on thefeature. When executed on a machine processor, a hook can redirectprogram execution to a set of instructions written to perform analysis.

As used herein, a behavioral feature is defined as a module or line ofcode which will generate an event of interest during program runtime. Byadding one or more “hooks” in a system of interest at the locations ofinterest, an observer will generate an aggregate of feature events asnecessary to generate a behavioral vector.

As used herein, a behavioral vector is defined as a set of behaviorfeatures which define the behavior of interest. The machine learningmodel can transform the behavioral vector using any linear or non-linearmethods to produce a result. The result is used as a heuristic formalware detection.

FIG. 1 is a block diagram illustrating data flow between components of asystem 100 for observing and classifying device events according toembodiments. In some embodiments, system 100 includes a computing device102 having an analysis software environment 104 that includes anobserver engine 106, an analyzer 108, and a model 110. System 100 canfurther include a modeler 112 that can be based on the computing device102, or can be external to computing device 102 as shown in FIG. 1.

Computing device 102 can be any type of smart phone. However, theinventive subject matter is not limited to smart phones, and inalternative embodiments, computing device 102 can be a tablet computer,a media player (e.g., MP3 player), a laptop computer, a desktop computeretc. In some embodiments, computing device 102 executes the Androidoperating system. In alternative embodiments, the computing device 102executes a version of the Linux operating system. In further alternativeembodiments, the computing device can execute a version of the iOSoperating system.

Observer engine 106 can acquire and process feature usage for anapplication or service running on a computing device 102. The observerengine 106 can maintain a behavioral vector for each application, andcan update the behavioral vector as feature usages by an application aredetected.

Model 110 can include data defining the features of interest. In someembodiments, model 110 can be a binary blob that includes an evaluationmodel and the list of features of interest. The evaluation model can bea program that receives as input a per-application behavioral vector,and returns a confidence value. In some embodiments, the list ofbehavioral features describes instrumentation points into the systemservices, the data type of the feature, and the feature's positioninside of the behavioral vector.

Analyzer 108 can be a service that can receive a model 110, evaluate amodel 110, and notify interested parties of the evaluation result. Insome embodiments, analyzer 108 can receive a model 110, unpack a binaryblob received from a higher-level service and split the model into twoparts, the evaluation program and the list of features of interest.Evaluating a model means to run the evaluation program against theper-application behavioral vector to determine a confidence value. Insome embodiments, notifying interested parties can include providing aninterface either through the use of a poll/read mechanism or broadcastor callback method to alert a mobile application that an event hasoccurred, namely that an application is malware.

The modeler 112 can include machine-learning techniques, calculations,and computational methods used to generate a model 110. The modeler 112is an abstraction for a machine learning process used to determine whichfeature set (out of tens of thousands of features) to use and thecomputational method (e.g. logistic regression, random forest) toprogram in the evaluation model. Various processes can run on anemulator farm to train the model. The modeler can receive event datafrom these processes as part of the supervised learning process and usesthe data to train the model.

During operation, applications and services executing on computingdevice 102 can generate many events. These events can be captured andplaced into an event stream 114 that is provided to observer engine 106and modeler 112. In some embodiments, these events compriseinter-process communication events that occur between an application anda service. Observer engine 106 can provide selected events 116 toanalyzer 108. The events can be selected based on features of interest,and such features of interest can be dynamically updated.

Further details on the various components of the above described systemare provided below with respect to FIG. 2.

FIG. 2 is a block diagram illustrating a software environment 200 forobserving and classifying device events according to embodiments. Insome embodiments, software environment 200 runs in an Android operatingsystem environment. The architecture sandboxes client applications andsystem applications in user-space memory. Communications betweenapplications and services can be facilitated through IPC (Inter-ProcessCommunication), message passing, shared memory, or other communicationfacility.

In some embodiments, client applications 210 and 212 communicate withservices (e.g., native services 230) using messages sent via an IPCmechanism referred to as “binder.” The binder IPC mechanism is providedon Android and other Linux based operating systems. Binder can include abinder driver 218 that resides in the kernel space, and a binderframework 216 that resides in user space. The binder framework 216 canbe considered “glue” code running in the user-space to facilitatecommunication with the kernel driver by an application (210 and/or 212)and service 230. Binder generally refers to both the kernel driverimplementation (e.g., binder driver 218) and the user-space binderframework 216 together.

Binder can provide extensive insight into the runtime behavior of anAndroid device at varying discrete levels of coarseness. A bindertransaction can provide an information-dense view of the communicationchain between two Android processes. For each transaction, thecommunication protocol includes both meta-data about the sender andreceiver along with flattened data from objects called parcels. A parcelis thus a container for the data that is the subject of the transaction.Observations performed on the binder transaction can provide highvisibility into the inter-process interaction because system calls aretypically made through the client-server message passing mechanism whichis part of the binder design. For a specific application (210 and/or212), aggregating the binder calls of that application can be used toprofile the application and determine its behavior.

In some embodiments, logger 220, analyzer 108, and observer engine 106operate to observe and analyze binder calls to create behavioral vectors228 that can be used to analyze an application's behavior and determineif the application is malicious (i.e., contains malware). Eachbehavioral vector represents the observed model-specific features for anapplication. The behavioral vector serves as a storage location forobserved events. In some embodiments, each behavioral vector can be alist of counters, where a counter can be associated with each feature inthe model, and where the counter increments if an event involving afeature is observed. In other words, if the method is called for aspecific service, and the method is deemed “interesting”, meaning to saythat the method is a feature included in the model, then the counter forthat feature will increment given that the observer engine 106 isexecuting and enabled. In alternative embodiments, the behavioral vectorcan include a sliding window of event transactions that are observed inaddition to, or instead of counters.

Logger 220 can be a set of “hooks” (small pieces of code used to gatherfeature usage inside the system). The hooks can be distributed aroundvarious components of the Android OS as desired to observe requestedfeatures. For example, hooks can be distributed to monitor aspects ofthe Android runtime 232 and/or native services 230 provided by theAndroid OS. In some embodiments, the hooks correspond to API calls thatcan also be found in the binder framework 216. Thus, the logger 220doesn't need to reside in the kernel space itself. In alternativeembodiments, the hooks can span across both user space and the kernel assome of the potentially observed features can be in the kernel only.Changes in the feature usage as determined by the various hooks can bedelivered to the observer engine 106 for further processing. The featureusage can be later used to distinguish between malicious and cleanapplications.

Observer engine 106 can acquire and process feature usage information.As described above, since at least some of the hooks of the logger 220can potentially reside in kernel space, in some embodiments, theobserver engine 106 resides in the kernel space as well to ensure theperformance impact of passing the observed features on the overallsystem is negligible. The observed features as reported by hooks oflogger 220 can be processed and it can be determined by observer engine106 if changes are made to the behavioral vectors 228 of the respectiveapplications. The behavioral vectors 228 can be stored in a preallocatedmemory structure. In response to any change to the behavioral vectors228, a security service 222 can be notified and the changed behavioralvectors can be delivered to an analyzer 108 component of the securityservice 222. The observer engine 106 can also react to changes in theobserved features as requested by the analyzer 108 as will be furtherdescribed below.

In some embodiments, an observation of an application is performed for apredetermined or configurable threshold amount of time and/or number ofevents on the first launch of an application, with the expectation ofgenerating a behavioral vector which can be analyzed. In suchembodiments, the analysis is not performed and the application is notclassified unless the desired threshold of time or events is met.

Security service 222 can be a “connecting” module between observerengine 106 and Software Development Kit (SDK) 214. In some embodiments,security service 222 can include analyzer 108 as a submodule thatclassifies behavior of applications. In alternative embodiments,analyzer 108 can be separate from security service 222. Security service222 can receive behavioral vectors from the observer engine 106 and passthem to the analyzer 108. Results from the analyzer 108 can then, ifneeded, be propagated to the SDK 214. The security service 222 can alsohandle updates of the model 110 in the analyzer 108. For example, insome embodiments, a model 110 can be passed to the security service 222via the SDK 214. The security service 222 can then verify the model 110and update the model in the analyzer 108. Security service 222 can alsoinform the respective modules in the stack (i.e., logger 220, observerengine 106 etc.) about the features to be observed requested by thecurrent model 110. Further details on the workflow of the model updatesare provided below.

SDK 214 can be a front end interface for a security application 208(i.e., an anti-virus or other anti-malware application) to interact withthe system. The SDK 214 can include Application Program Interfaces(APIs) for authenticating an application (e.g., applications 210 and/or212). SDK 214 can also include APIs for controlling the behavior of thesystem as a whole and updating the model 110. The SDK 214 can propagateresults of the classification done in the analyzer 108 to the securityapplication 208 so that it can react accordingly to events that arecurrently happening on the device. SDK 214 can be provided in binaryform with appropriate documentation to security application developers.Additionally, in some embodiments, the API for SDK 214 can includemethods to enable/disable monitoring for an application, quarantine anapplication, and/or disable an application package.

The above-described components and APIs can be used in variousworkflows. Such workflows can include updating a model 110, vectorizinga behavioral observation, evaluating a model 110 on apackage/application vector, and notifying a security application 208regarding an application.

Pushing/Updating a Model into the Analyzer

A security application 208 can initiate this flow by issuing a call to amethod in SDK 214 to push or update a model to the system. In someembodiments, this causes the model 110 to be sent to the securityservice 222 via a binder IPC. The security service 222 can verify thesignature of the model 110, unpack it and split it into its componentparts: the evaluation model itself and a list of behavioral features toobserve. The evaluation model 110 can be a program that receives asinput a per-application behavioral vector, and returns a confidencevalue. This value is later sent as an event to the security application208 (see ‘Notifying a Security Application” flow below). The list ofbehavioral features describes the instrumentation points into the systemservices (e.g., native service 230 and runtime 232), the data type ofthe feature, and its position inside of the behavioral vector. In someembodiments, each feature is uniquely identified from the correspondingservice's “binder interface” (also known as an Android InterfaceDefinition Language (AIDL) file if automatically generated) with anopcode. These opcodes are defined before or during compilation of the“binder interface”. The opcode and service pair can uniquely identifythe feature.

Once the security service 222 has unpacked the model 110, the securityservice 222 can notify system services that a new model 110 isavailable, to which a thread (e.g., updater thread 408, FIG. 4) runningon each service can respond by reading the new features list andapplying the instrumentation points if needed. This mechanism works thesame whether it is a first initialization or a later update of the model110. In some embodiments, when a new model 110 is pushed, theper-application behavioral vectors can be reset.

Vectorizing a Behavioral Observation

This workflow is processed when a method call to a system service isdeemed of interest, i.e., the feature is in the model-specificinstrumentation list. Several checks can be performed to decide if thebehavior needs to be accounted for. These checks can be performed in anorder that rules out the most common cases first and gets the behavioralcode out of the way to make the IPC call as fast as possible. This isdesirable, because it can minimize the overhead imposed by the observerengine 106.

In some embodiments, the system checks to determine if the systemservice is being instrumented. The system can then check to confirm thatthe specific method being intercepted needs to be counted. The systemcan further check to confirm that the caller is an application that isunder active evaluation (i.e., it has not been disabled). In alternativeembodiments, the system can extend each to check to further includeconditions as determined by feature list 234. Such conditions could be,for example, if the application calling this binder method is thecurrent Foreground application, or if some other feature has beendetected before (what is usually known as quadratic or, more generally,non-linear event chain). As noted above, the order of these checks canvary depending how each individual check affects the overhead ofmonitoring the IPC transaction.

Once these checks have passed, the system can modify the per-applicationbehavioral vector in the model 110. In some embodiments, the system canset a flag that the analyzer 108 can use to decide which applicationsneed to be evaluated again.

Evaluating a Model on a Package Vector

In some embodiments, security service 222 can wait for notifications ofnew behavioral information. Whenever security service 222 determinesthat a behavioral vector for an application has changed, it can evaluatethe model 110 using this changed behavioral vector as an inputparameter. Models 110 can include evaluation models that are compiled assmall executable programs to reduce any overhead during the evaluation.In some embodiments, the security service 222 can have direct access tothe map of behavioral vectors 228 to avoid the need to request or copythe vector's buffer. These two design decisions, compiled models andzero-copy semantics, can make the whole model evaluation more efficientwhen compared to typical systems.

Notifying a Security Application

When an evaluation model returns a confidence value above a configurablethreshold, an event including relevant information about the packageinvolved can be created and sent to the security application 208 (e.g.,using a binder IPC call). The receiver of this callback can make a finaldetermination as to how to react to this potential threat. Multipleevents might be created related to a same application. A securityapplication 208 can perform any of several actions when an event isreceived. For example, the security application 208 can use the APIsprovided by the SDK 214 to quarantine an app, disable the monitoring forthe app if it is considered a false positive, or just do nothing andkeep collecting behavioral information if the confidence is not highenough yet. Additionally, in some embodiments, the security applicationcan upload the event data to modeler 112 in order to improve the model110 training.

It should be noted that although FIG. 2 has been discussed in thecontext of the Android OS and binder, the embodiments are not solimited. For example, the IPC framework in iOS is known as XPC. Similarin function to binder, XPC provides a client-service remote procedurecall mechanism which relays calls between clients and services. XPC isalso similar to binder in that parcels are containers that can includeflattened objects, XPC flattens objects into serialized property lists.Thus, the iOS XPC can be considered the iOS equivalent to binder inAndroid. As a result, one of ordinary skill in the art having thebenefit of the disclosure will appreciate that the aspects of theinventive subject matter described herein can be applied to iOS and XPC.

Further, it should be noted that FIG. 2 illustrates embodiments in whichthe observer engine 106 executes in kernel space, and the behavioralvectors are stored in kernel space. In these embodiments, observerengine 106 can be implemented as a device, and UNIX ioctl calls can beused to communicate with the observer engine 106. In alternativeembodiments, either or both observer engine 106 and behavioral vectors228 can reside in user space as part of the Android OS. In theseembodiments, observer engine 106 can be implemented as an OS daemon andaccessed through UNIX sockets or other IPC mechanisms.

FIG. 3 is a block diagram illustrating an event flow between anapplication 302 and a service 304 in a system for observing andclassifying device events according to embodiments. An application 302can initiate a request for a service provided by service 304. Therequest can be provided to a service transaction framework 306 viacommunication 1. In some embodiments, service transaction framework 306can be the Android binder framework. In alternative embodiments, servicetransaction framework 306 can be iOS XPC. Interfaces between application302 and service transaction framework 306 can perform marshalling andserialization of data so that the data in communication 1 is in a formatexpected by the service transaction framework 306.

The service transaction framework 306 can forward the request to kernelIPC component 308 via communication 2. In some embodiments, kernel IPCcomponent 308 can be a binder driver, and communication 2 can beperformed using an ioctl system call to the driver. Service transactionframework 306 can perform any required marshalling and serialization ofthe data so that the data is in a format expected by kernel IPCcomponent 308.

A dispatcher (not shown) in kernel IPC component 308 forwards therequest to filter 310 via communication 3. The request is in a formatexpected by the service transaction framework 306. In some aspects,filter 310 can be a modified version of the Binder callIPCThreadState:executeCommand( ) Filter 310 can receive communication 3and check for the existence of a hooks table by testing a pointer to thehooks table. If the pointer exists, a hooks table was loaded as part ofa previous transaction request, and the filter can access the table. Thehooks table comprises a table of hooks that correspond to the featuresto be observed as specified in the features list. In some embodiments,the filter 310 can be part of a modified binder framework that replacesthe standard binder version of the framework. In alternativeembodiments, filter 310 can be part of a modified XPC framework. Inaddition to filter 310, the modified framework can include modificationsto perform “hooking” and to run the analyzer 108.

Filter 310 can check if the transaction code to transact provided in therequest is inside the boundaries of the hooks table. In someembodiments, the transaction code comprises the “numeric transactioncode” represented by the opcode from the RPC interface known as AIDL.Each opcode uniquely defines a single callable method from the service.If the code is found in the hooks table, the transaction request (e.g.,a binder or XPC call) can be logged. In other words, if the opcode indexis set, then the opcode is hooked.

Because the code for hooking is on the request path, it is desirable forthe program to resume its normal operation as quickly as possible. Thusin some embodiments, provisions to remove processing overhead are madeusing pre-computation techniques. Additionally, in some embodiments, nomemory allocation, searching, or hashing is performed during the hook.

Filter 310 forwards the request to service transaction framework 306 viacommunication 4. Service transaction framework 306 then forwards therequest to service 304 via communication 5.

Service 304 can process the request, and the results of the request canbe returned to the application 302 through service transaction framework306 and kernel IPC component 308 via communications 6-9.

Placing the filter 310 between a dispatcher in kernel IPC component 308and the service transaction framework 306 can be desirable as it canensure that all transaction requests will pass through filter 310 andthat all transaction requests that are in the model's feature list canbe observed. At the filtration point, the classification filter 310 canfollow rules to ignore transaction requests which represent callerfunctions which are not in the feature list.

FIG. 4 is a block diagram illustrating components for updating a dynamicmodel in a system of observing and classifying device events accordingto embodiments. In the example illustrated in FIG. 4, two services,service A 414 and service B 416 provide services to applications runningon a computing device. Service A 414 and service B 416 can haverespective thread pools (410, 412) that comprise groups of availabledispatcher threads (404, 406) that perform functions provided by theirrespective services.

In some embodiments, the composition of the aggregated behavioral vector402 can be modified by loading a feature list 234 during runtime of aservice (e.g., service A 414, service B 416). In some aspects, theaggregated behavioral vector 402 comprises an aggregate of allbehavioral vectors for each application, stored in a contiguous memorylocation as initialized by the observer engine 106. The feature list 234can be stored and updated from a binary section of the machine-learningmodel 110 to be loaded on device boot or to be updated during runtime.The feature list 234 can be dynamically loaded into the filter 310 (FIG.3) and used to observe transactions between applications and service(e.g., binder or XPC transactions). Hooking may only be performed fortransaction requests that match the filter rules. Instead of placingstatic hooks in the system in the code-path between the client and theserver—namely at the client-side function call as in previousapproaches, a targeted filtration point is chosen on the receiving endat the IPC dispatcher on the service end. Prior to dispatch and afterreceiving the IPC call from the kernel IPC component 308 (FIG. 3), thefilter 310 tests whether or not to hook the call. If the filterdetermines that the call is to be hooked, the hook sets the flag in thebehavioral vector 402 to signal that a feature has been observed.

Updater thread 408 waits on a flag to change and if so updates the tableof hooks from the observer engine 106. When an updated model 110 isloaded into the security driver 208, a flag can be set that instructsthe thread to dereference the old hooks table and reference the newhooks table. In some embodiments, futex (fast user space mutex) callsare used to cause the updater thread 408 to be taken out of thescheduling loop until such a change occurs. That is, the updater thread408 won't typically waste any CPU cycles.

FIG. 5 is a flow chart 500 illustrating a method for initializingobservation and classification of device events according toembodiments. The method may, in some aspects, constitute computerprograms made up of computer-executable instructions. Describing themethod by reference to a flowchart enables one skilled in the art todevelop such programs including such instructions to carry out themethod on suitable processors (the processor or processors of thecomputer executing the instructions from computer-readable media). Themethod illustrated in FIG. 5 is inclusive of acts that may be taken byan operating environment executing an example embodiment of theinvention.

In some embodiments, the method illustrated in FIG. 5 is executed by aservice when a first binder transaction is received for the service. Forexample, the method of FIG. 5 can be executed in the Binder classconstructor.

At block 502, a service connects to a security driver 208. In someembodiments, the service connects to a security driver when a new binderservice object is created in the service.

At block 504, the service maps a features (i.e., hooks) list. In someembodiments, a service accesses the binder driver on initialization tomap a shared memory buffer where the features list is stored. The memorymapping of the shared memory buffer during initialization allows theservice to reference the feature list without any copy semantics andallows the service to create the reference only once during serviceinitialization. The reference to the features list is recreated by theupdater thread (FIG. 4, 408) when a new model is loaded.

At block 506, the service maps application behavioral vectors. As notedabove, the application “behavior” is tracked in a behavioral vector. Insome embodiments, a dispatcher thread (FIG. 4, 404 and 406) in theservice of interest performs the behavioral vector modification. In suchembodiments, the service's dispatcher thread (FIG. 4, 404 and 406), ormore generically, the service, has a reference to the storage locationwhere the “behavior” of all applications are tracked (e.g., behavioralvectors 228. In some embodiments, the storage location is ashared-memory mapping initialized by the kernel and referenced by theservice through an ioctl call. In some embodiments, the storage locationcan be a contiguous block of memory apportioned with a fixed size perapplication. The offset for each application can be a multiple of theapplication's UID.

At block 508 the service determines the hooks for the service. In someembodiments, hooks are specific to a service. The feature list can be alist of method calls from all services that are relevant to the machinelearning model 110. Each service's binder interface definition caninclude a list of opcodes, each opcode corresponding to a callablemethod. The service maps into memory a table (or array) of which opcodesfor the current service are currently enabled for observation by settingone or multiple bits corresponding to the location in the arrayrepresenting the opcode. In other words, a flag is set for each opcodeto log and unset for each opcode to ignore. The result of the mappingcomprises a hooks table.

The service can determine the hooks (or hooked opcodes) by comparing theavailable opcodes for the current service with the service/opcode listin the feature list. Opcodes which match the current service are flaggedin the hooks table.

At block 510, the service spawns a thread (e.g., updater thread 408,FIG. 4) whose purpose is to wait on a flag to change and if so updatethe table of hooks from the driver. When an updated model 110 is loadedinto the security driver 208, a flag can be set that instructs thethread to dereference the old hooks table and reference the new hookstable. As noted above, in some embodiments, futex calls can be used tocause the updater thread 408 to be taken out of the scheduling loopuntil such a change occurs. In some embodiments, the updater thread 408is started per each process of the service, not per service.

FIG. 6 is a flow chart 600 illustrating a method for filtering deviceevents according to embodiments. In some embodiments, the filter isdesigned to minimize overhead for the most common case, and can test atthree levels of granularity, the service level, method level, andapplication level.

At block 602, the method receives an application event. In someembodiments, the event is a Binder transaction between an applicationand a service (or a service and another service). In alternativeembodiments, the event can be an XPC transaction between an applicationand a service.

At block 604, the filter 310 determines if the service is hooked (i.e.,the features of the service are being observed). In some embodiments,this can be done in one pointer comparison to determine if a hooktable/feature list has been loaded. The service-level filter rule atblock 604 checks to see if the recipient of the binder call matches aservice in the loaded model 110. The determination of which services tohook is related to the choice of which features to hook. The choice ofwhich features to hook is determined using the modeler 112. If theservice is not hooked, the method proceeds to block 612 to process thetransaction as usual. If the service is hooked, the method proceeds toblock 606.

At block 606, the filter 310 determines if the method is hooked. In someembodiments, the method-level filter rule of block 606 checks to see ifone of the functions matches a function in the loaded model 110. In someembodiments, a feature is uniquely identified from the correspondingservice's “Binder interface” (also known as an AIDL file ifautomatically generated) with an opcode. These opcodes are definedbefore or during compilation of the “Binder interface.” Since the opcodeand the service pair uniquely identifies the feature, the opcode andservice pair can be used to determine if a method is hooked. If themethod is not hooked, the method proceeds to block 612 to process thetransaction as usual. If the method is hooked, the method proceeds toblock 608.

At block 608, the filter 310 determines if the application is beingevaluated. In some embodiments, the application-level filter rule atblock 608 checks a whitelist of applications known to be benign. Thewhitelist can be maintained by binder driver 218. Alternatively, thefilter rule can check a list of applications to be analyzed. If theapplication is not being analyzed, the method proceeds to block 612 toprocess the transaction as usual. If the application is being analyzed,the method proceeds to block 610.

At block 610, event data regarding the transaction can be sent to theobserver engine 106 (FIG. 1).

The method then proceeds to block 612 to process the transaction asusual, and the method ends.

As can be appreciated from the above, the amount of overhead required toprocess the binder call depends on the depth of filter process. Forbinder calls which reach the application-level logic, the cost inmachine cycles is relatively high. In the same manner, binder callswhich reach only the service-level logic have a lower cost in machinecycles. The finest level of granularity, the application-level can beconsidered a relatively high-overhead transaction while the method-leveland service-level require less overhead each respectively. The relativefrequency of reaching each filter-level is weighted towards theservice-level call in the flow chart of FIG. 6. Those of skill in theart having the benefit of the disclosure will appreciate that the orderof checking can be adjusted in the case that it is determined that in aparticular environment, the overhead and/or frequency associated with alevel makes it desirable to perform the checks in a different order thanthe order shown in FIG. 6.

The inventors have determined that the service-level check is the commoncase, meaning that the likelihood of the filter logic passing theservice-level check is a low probability event. For this reason, theservice-level check can be most optimized. To optimize this, theservice-level logic consists of a one-time load to obtain a reference tothe hooks table for that service. Any subsequent calls require only apointer comparison to determine if the hooks table is loaded. Thisinexpensive check prevents the more costly second-level and third-levelfilter checks from occurring.

The same overhead cost can be extrapolated for the method-level filtercheck. The method level check at block 606 can be accomplished in oneload instruction and three comparisons in some embodiments.

The uncommon case, in which the application-level filter check at block608 results in a positive match for the two coarser granularity checksand a negative match for finding the application in the whitelist, canbe considered a high-overhead operation. The check at block 608 can beaccomplished in approximately eighteen instructions in some embodiments.However, the machine-learning model 110 can be optimized to use both asmall subset of total services and a small subset of all possiblemethods from those services. The checking can be further optimized byusing a whitelist of applications.

In some embodiments, the total subset of features to be observed can berelatively small when compared to the total number of possible features.By way of example and not limitation, the number of features observed inparticular embodiments may be in a range of 150-5000 out ofapproximately 50,000 total possible features. Those of skill in the arthaving the benefit of the disclosure will appreciate that the number ofobserved features and/or the number of possible features may vary indifferent embodiments, and may vary outside of the given range. Bykeeping the number of features relatively small when compared to thetotal number of features, the likelihood of the filter logic exceedingthe application level is low because of the decision-tree probabilitiesfor the negative match cases is expected to be low. For this reason, thefilter design of some embodiments emphasizes a negative match at theservice-level and minimizes the depth of filter checking. This can bedone by minimizing the number of features hooked, minimizing the numberof methods hooked, and optimizing the application whitelist.

As will be appreciated from the above, some embodiments can provideimprovements in computer and mobile device functionality. For example,the behavior based system can result in improved detection of malwarewhen compared to conventional systems that analyze code to look for codefragments that match known malware. Such conventional systems can bedefeated by minor changes to the code that do not affect the malware'smain purpose. Instead, the systems and methods described herein cananalyze the actual behavior of the code rather than the code itself.This can result in improved detection of malware. Further, detection andremoval of malware can result in improved functionality of the computeror mobile device.

With reference to FIG. 7, an example embodiment extends to a machine inthe example form of a computer system 700 within which instructions forcausing the machine to perform any one or more of the methodologiesdiscussed herein may be executed. In alternative example embodiments,the machine operates as a standalone device or may be connected (e.g.,networked) to other machines. In a networked deployment, the machine mayoperate in the capacity of a server or a client machine in server-clientnetwork environment, or as a peer machine in a peer-to-peer (ordistributed) network environment. Further, while only a single machineis illustrated, the term “machine” shall also be taken to include anycollection of machines that individually or jointly execute a set (ormultiple sets) of instructions to perform any one or more of themethodologies discussed herein.

The example computer system 700 may include a processor 702 (e.g., acentral processing unit (CPU), a graphics processing unit (GPU) orboth), a main memory 704 and a static memory 706, which communicate witheach other via a bus 708. The computer system 700 may further include atouchscreen display unit 710. In example embodiments, the computersystem 700 also includes a network interface device 720.

The persistent storage unit 716 includes a machine-readable medium 722on which is stored one or more sets of instructions 724 and datastructures (e.g., software instructions) embodying or used by any one ormore of the methodologies or functions described herein. Theinstructions 724 may also reside, completely or at least partially,within the main memory 704 or within the processor 702 during executionthereof by the computer system 700, the main memory 704 and theprocessor 702 also constituting machine-readable media.

While the machine-readable medium 722 is shown in an example embodimentto be a single medium, the term “machine-readable medium” may include asingle medium or multiple media (e.g., a centralized or distributeddatabase, or associated caches and servers) that store the one or moreinstructions. The term “machine-readable medium” shall also be taken toinclude any tangible medium that is capable of storing, encoding, orcarrying instructions for execution by the machine and that cause themachine to perform any one or more of the methodologies of embodimentsof the present invention, or that is capable of storing, encoding, orcarrying data structures used by or associated with such instructions.The term “machine-readable storage medium” shall accordingly be taken toinclude, but not be limited to, solid-state memories and optical andmagnetic media that can store information in a non-transitory manner,i.e., media that is able to store information. Specific examples ofmachine-readable storage media include non-volatile memory, including byway of example semiconductor memory devices (e.g., Erasable ProgrammableRead-Only Memory (EPROM), Electrically Erasable Programmable Read-OnlyMemory (EEPROM), and flash memory devices); magnetic disks such asinternal hard disks and removable disks; magneto-optical disks; andCD-ROM and DVD-ROM disks. A machine-readable storage medium does notinclude signals.

The instructions 724 may further be transmitted or received over acommunications network 726 using a signal transmission medium via thenetwork interface device 720 and utilizing any one of a number ofwell-known transfer protocols (e.g., FTP, HTTP). Examples ofcommunication networks include a local area network (LAN), a wide areanetwork (WAN), the Internet, mobile telephone networks, Plain OldTelephone (POTS) networks, and wireless data networks (e.g., WiFi andWiMax networks). The term “machine-readable signal medium” shall betaken to include any transitory intangible medium that is capable ofstoring, encoding, or carrying instructions for execution by themachine, and includes digital or analog communications signals or otherintangible medium to facilitate communication of such software.

Although an overview of the inventive subject matter has been describedwith reference to specific example embodiments, various modificationsand changes may be made to these embodiments without departing from thebroader scope of embodiments of the present invention. Such embodimentsof the inventive subject matter may be referred to herein, individuallyor collectively, by the term “invention” merely for convenience andwithout intending to voluntarily limit the scope of this application toany single invention or inventive concept if more than one is, in fact,disclosed.

As is evident from the foregoing description, certain aspects of theinventive subject matter are not limited by the particular details ofthe examples illustrated herein, and it is therefore contemplated thatother modifications and applications, or equivalents thereof, will occurto those skilled in the art. It is accordingly intended that the claimsshall cover all such modifications and applications that do not departfrom the spirit and scope of the inventive subject matter. Therefore, itis manifestly intended that this inventive subject matter be limitedonly by the following claims and equivalents thereof.

The Abstract is provided to comply with 37 C.F.R. § 1.72(b) to allow thereader to quickly ascertain the nature and gist of the technicaldisclosure. The Abstract is submitted with the understanding that itwill not be used to limit the scope of the claims.

What is claimed is:
 1. A method for observing device events, the methodcomprising: receiving a transaction request from a client applicationfor an operating system service; determining whether the operatingsystem service, a method associated with the transaction request, andthe client application are being observed; in response to determiningthat the operating system service, the method associated with thetransaction request, and the client application are being observed,modifying a behavioral vector associated with the client application andprocessing the transaction request; and in response to determining thatat least one of the operating system service, the method associated withthe transaction request, and the client application are not beingobserved, processing the transaction request without modifying thebehavioral vector associated with the client application.
 2. The methodof claim 1, further comprising: receiving a set of features to beobserved, the set of features to be observed associated with one or moremethods of one or more services; wherein determining whether theoperating system service, the method associated with the transactionrequest, and the client application are being observed includesdetermining whether the method associated with the transaction requestcomprises one of the one or more methods of the one or more services. 3.The method of claim 2, wherein the transaction request is associatedwith a feature in the set of features to be observed, and whereinmodifying the behavioral vector associated with the client applicationcomprises setting a value in the behavioral vector to indicate that themethod associated with the transaction request has been invoked.
 4. Themethod of claim 1, further comprising determining, based at least inpart on the behavioral vector associated with the client application,that the client application comprises malware.
 5. The method of claim 4,wherein determining, based at least in part on the behavioral vectorassociated with the client application, that the client applicationcomprises malware comprises correlating the behavioral vector associatedwith the client application with previously observed behavioral vectorsassociated with malware.
 6. The method of claim 1, wherein at least oneof the operating system service and the method associated with thetransaction request are implemented in Android as a component of binder.7. The method of claim 1, wherein at least one of the operating systemservice and the method associated with the transaction request areimplemented in iOS as a component of XPC.
 8. A non-transitorymachine-readable storage medium having stored thereoncomputer-executable instructions for observing device events, thecomputer-executable instructions to cause one or more processors toperform operations comprising: receive a transaction request from aclient application for an operating system service; determine whetherthe operating system service, a method associated with the transactionrequest, and the client application are being observed; in response to adetermination that the operating system service, the method associatedwith the transaction request, and the client application are beingobserved, modify a behavioral vector associated with the clientapplication and process the transaction request; and in response to adetermination that at least one of the operating system service, themethod associated with the transaction request, and the clientapplication are not being observed, process the transaction requestwithout modification of the behavioral vector associated with the clientapplication.
 9. The non-transitory machine-readable storage medium ofclaim 8, wherein the computer-executable instructions further compriseinstructions to: receive a set of features to be observed, the set offeatures to be observed associated with one or more methods of one ormore services; wherein the computer-executable instructions to determinewhether the operating system service, the method associated with thetransaction request, and the client application are being observedinclude instructions to determine whether the method associated with thetransaction request comprises one of the one or more methods of the oneor more services.
 10. The non-transitory machine-readable storage mediumof claim 9, wherein the transaction request is associated with a featurein the set of features to be observed, and wherein modifying thebehavioral vector associated with the client application comprisessetting a value in the behavioral vector to indicate that the methodassociated with the transaction request has been invoked.
 11. Thenon-transitory machine-readable storage medium of claim 8, wherein thecomputer-executable instructions further comprise instructions todetermine, based at least in part on the behavioral vector associatedwith the client application, that the client application comprisesmalware.
 12. The non-transitory machine-readable storage medium of claim11, wherein the computer-executable instructions to determine, based atleast in part on the behavioral vector associated with the clientapplication, that the client application comprises malware compriseinstructions to correlate the behavioral vector associated with theclient application with previously observed behavioral vectorsassociated with malware.
 13. The non-transitory machine-readable storagemedium of claim 8, wherein at least one of the operating system serviceand the method associated with the transaction request are implementedin Android as a component of binder.
 14. A system for observing deviceevents, the system comprising: one or more processors; and anon-transitory machine-readable medium having stored thereoncomputer-executable instructions to cause the one or more processors to:receive a transaction request from a client application for an operatingsystem service, determine whether the operating system service, a methodassociated with the transaction request, and the client application arebeing observed, in response to a determination that the operating systemservice, the method associated with the transaction request, and theclient application are being observed, modify a behavioral vectorassociated with the client application and process the transactionrequest, and in response to a determination that at least one of theoperating system service, the method associated with the transactionrequest, and the client application are not being observed, process thetransaction request without modification of the behavioral vectorassociated with the client application.
 15. The system of claim 14,wherein the computer-executable instructions further compriseinstructions to: receive a set of features to be observed, the set offeatures to be observed associated with one or more methods of one ormore services; wherein the computer-executable instructions to determinewhether the operating system service, the method associated with thetransaction request, and the client application are being observedinclude instructions to determine whether the method associated with thetransaction request comprises one of the one or more methods of the oneor more services.
 16. The system of claim 15, wherein the transactionrequest is associated with a feature in the set of features to beobserved, and wherein modifying the behavioral vector associated withthe client application comprises setting a value in the behavioralvector to indicate that the method associated with the transactionrequest has been invoked.
 17. The system of claim 14, wherein thecomputer-executable instructions further comprise instructions todetermine, based at least in part on the behavioral vector associatedwith the client application, that the client application comprisesmalware.
 18. The system of claim 17, wherein the computer-executableinstructions to determine, based at least in part on the behavioralvector associated with the client application, that the clientapplication comprises malware comprise instructions to correlate thebehavioral vector associated with the client application with previouslyobserved behavioral vectors associated with malware.
 19. The system ofclaim 14, wherein at least one of the operating system service and themethod associated with the transaction request are implemented inAndroid as a component of binder.
 20. The system of claim 14, wherein atleast one of the operating system service and the method associated withthe transaction request are implemented in iOS as a component of XPC.